AITopics | pl 0

Collaborating Authors

pl 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain

Min, Marcus J., Ding, Yangruibo, Buratti, Luca, Pujar, Saurabh, Kaiser, Gail, Jana, Suman, Ray, Baishakhi

arXiv.org Artificial IntelligenceJan-16-2024

Code Large Language Models (Code LLMs) are being increasingly employed in real-life applications, so evaluating them is critical. While the conventional accuracy evaluates the performance of Code LLMs on a set of individual tasks, their self-consistency across different tasks is overlooked. Intuitively, a trustworthy model should be self-consistent when generating natural language specifications for its own code and generating code for its own specifications. Failure to preserve self-consistency reveals a lack of understanding of the shared semantics underlying natural language and programming language, and therefore undermines the trustworthiness of a model. In this paper, we first formally define the self-consistency of Code LLMs and then design a framework, IdentityChain, which effectively and efficiently evaluates the self-consistency and conventional accuracy of a model at the same time. We study eleven Code LLMs and show that they fail to preserve self-consistency, which is indeed a distinct aspect from conventional accuracy. Furthermore, we show that IdentityChain can be used as a model debugging tool to expose weaknesses of Code LLMs by demonstrating three major weaknesses that we identify in current models using IdentityChain. Our code is available at https://github.com/marcusm117/IdentityChain.

code llm, evaluation, pl 0, (15 more...)

arXiv.org Artificial Intelligence

2310.14053

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Enhancing Counterfactual Classification via Self-Training

Gao, Ruijiang, Biggs, Max, Sun, Wei, Han, Ligong

arXiv.org Machine LearningDec-8-2021

Unlike traditional supervised learning, in many settings only partial feedback is available. We may only observe outcomes for the chosen actions, but not the counterfactual outcomes associated with other alternatives. Such settings encompass a wide variety of applications including pricing, online marketing and precision medicine. A key challenge is that observational data are influenced by historical policies deployed in the system, yielding a biased data distribution. We approach this task as a domain adaptation problem and propose a self-training algorithm which imputes outcomes with categorical values for finite unseen actions in the observational data to simulate a randomized trial through pseudolabeling, which we refer to as Counterfactual Self-Training (CST). CST iteratively imputes pseudolabels and retrains the model. In addition, we show input consistency loss can further improve CST performance which is shown in recent theoretical analysis of pseudolabeling. We demonstrate the effectiveness of the proposed algorithms on both synthetic and real datasets.

algorithm, observational data, pseudolabel, (14 more...)

arXiv.org Machine Learning

2112.04461

Country:

North America > United States > Virginia (0.04)
Europe > United Kingdom > Wales (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Marketing (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback